Depends on What the French Say - Spoken Corpus Annotation with and beyond Syntactic Functions

نویسندگان

  • José Deulofeu
  • Lucie Duffort
  • Kim Gerdes
  • Sylvain Kahane
  • Paola Pietrandrea
چکیده

We present a syntactic annotation scheme for spoken French that is currently used in the Rhapsodie project. This annotation is dependency-based and includes coordination and disfluency as analogously encoded types of paradigmatic phenomena. Furthermore, we attempt a thorough definition of the discourse units required by the systematic annotation of other phenomena beyond usual sentence boundaries, which are typical for spoken language. This includes so called “macrosyntactic” phenomena such as dislocation, parataxis, insertions, grafts, and epexegesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie

This article presents the methods, results, and precision of the syntactic annotation process of the Rhapsodie Treebank of spoken French. The Rhapsodie Treebank is an 33,000 word corpus annotated for prosody and syntax, licensed in its entirety under Creative Commons. The syntactic annotation contains two levels: a macro-syntactic level, containing a segmentation into illocutionary units (inclu...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Automatic detection and annotation of disfluencies in spoken French corpora

In this paper we propose a multi-step system for the semiautomatic detection and annotation of disfluencies in spoken corpora. A set of rules, statistical models and machine learning techniques are applied to the input, which is a transcription aligned to the speech signal. The system uses the results of an automatic estimation of prosodic, part-of-speech and shallow syntactic features. We pres...

متن کامل

Syntactic annotation of spoken utterances: A case study on the Czech Academic Corpus

Corpus annotation plays an important role in linguistic analysis and computational processing of both written and spoken language. Syntactic annotation of spoken texts becomes clearly a topic of considerable interest nowadays, driven by the desire to improve automatic speech recognition systems by incorporating syntax in the language models, or to build language understanding applications. Synt...

متن کامل

Detection and Analysis of Paraphrastic Reformulations in Spoken Corpora (Repérage et analyse de la reformulation paraphrastique dans les corpus oraux) [in French]

Our work addresses the automatic detection of paraphrastic rephrasing in spoken corpus. The proposed approach is syntagmatic. It is based on paraphrastic rephrasing markers and the specificities of the spoken language. Manual annotation performed by two annotators provides fine-grained and multi-dimensional description of the reference data. Automatic method is proposed in order to decide wheth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010